The Ultimate Playlist - Hustle & Heart 🎢🎧

Author

Dhruv Sharma

Published

April 22, 2025

🎧 Introduction

From millions of Spotify tracks and playlists, Hustle & Heart emerges as a curated sound journey built on energy, emotion, and authenticity. This project explores what makes songs stick β€” analyzing popularity, danceability, and musical DNA β€” before distilling it all into a final 12-track playlist that hits with both data and vibe.

🎢 Just here for the playlist? Tap here

βš™οΈ Setup: Load & Install Required Packages

This chunk ensures all necessary R packages are installed and loaded before running the rest of the analysis. βœ…πŸ“¦

Code
ensure_package <- function(pkg){
  if (!requireNamespace(pkg, quietly = TRUE)) {
    install.packages(pkg, repos = "https://cloud.r-project.org")
  }
  library(pkg, character.only = TRUE)
}

required_packages <- c(
  "dplyr", "stringr", "tidyr", "purrr", "readr", "jsonlite",
  "ggplot2", "scales", "DT", "rvest", "httr2", "tibble"
)

invisible(lapply(required_packages, ensure_package))

options(dplyr.summarise.inform = FALSE)

🎧 Spotify Style Setup

This chunk sets a custom Spotify-themed style for all plots and tables to give the report a bold, immersive aesthetic. πŸŽ¨πŸŸ’πŸ–€

Code
library(ggplot2)
library(kableExtra)

theme_spotify <- function() {
  theme_minimal(base_family = "Arial") +
    theme(
      plot.background = element_rect(fill = "#191414", color = NA),
      panel.background = element_rect(fill = "#191414", color = NA),
      panel.grid = element_line(color = "#1DB954", linewidth = 0.1),
      text = element_text(color = "white"),
      axis.title = element_text(face = "bold", color = "white"),
      axis.text = element_text(color = "#b3b3b3"),
      plot.title = element_text(size = 16, face = "bold", color = "#1DB954"),
      plot.subtitle = element_text(size = 12, color = "#b3b3b3")
    )
}

spotify_table <- function(df, caption_text = "") {
  knitr::kable(df, format = "html", caption = caption_text) |>
    kableExtra::kable_styling(
      full_width = TRUE,
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      position = "left"
    ) |>
    kableExtra::row_spec(0, background = "#1DB954", color = "white") |>
    kableExtra::kable_styling(font_size = 14)
}

🎧 Task 1: Load Spotify Song Characteristics

In this first task, we download and clean a Spotify song characteristics dataset made available via GitHub. The dataset includes song-level features such as danceability, energy, valence, and more. Our goal is to create a clean, rectangular dataset where each row corresponds to a single artist-song pair.

id name duration_ms release_date year acousticness danceability energy instrumentalness liveness loudness speechiness tempo valence mode key popularity explicit artist
6KbQ3uYMLKb5jDxLF7wYDD Singende Bataillone 1. Teil 158648 1928 1928 0.995 0.708 0.1950 0.563 0.1510 -12.428 0.0506 118.469 0.7790 1 10 0 0 Carl Woitschach
6KuQTIu1KoTTkLXKrwlLPV FantasiestΓΌcke, Op. 111: PiΓΉ tosto lento 282133 1928 1928 0.994 0.379 0.0135 0.901 0.0763 -28.454 0.0462 83.972 0.0767 1 8 0 0 Robert Schumann
6KuQTIu1KoTTkLXKrwlLPV FantasiestΓΌcke, Op. 111: PiΓΉ tosto lento 282133 1928 1928 0.994 0.379 0.0135 0.901 0.0763 -28.454 0.0462 83.972 0.0767 1 8 0 0 Vladimir Horowitz
6L63VW0PibdM1HDSBoqnoM Chapter 1.18 - Zamek kaniowski 104300 1928 1928 0.604 0.749 0.2200 0.000 0.1190 -19.924 0.9290 107.177 0.8800 0 5 0 0 Seweryn GoszczyΕ„ski
6M94FkXd15sOAOQYRnWPN8 Bebamos Juntos - Instrumental (Remasterizado) 180760 9/25/28 1928 0.995 0.781 0.1300 0.887 0.1110 -14.734 0.0926 108.003 0.7200 0 1 0 0 Francisco Canaro
6N6tiFZ9vLTSOIxkj8qKrd Polonaise-Fantaisie in A-Flat Major, Op. 61 687733 1928 1928 0.990 0.210 0.2040 0.908 0.0980 -16.829 0.0424 62.149 0.0693 1 11 1 0 FrΓ©dΓ©ric Chopin
6N6tiFZ9vLTSOIxkj8qKrd Polonaise-Fantaisie in A-Flat Major, Op. 61 687733 1928 1928 0.990 0.210 0.2040 0.908 0.0980 -16.829 0.0424 62.149 0.0693 1 11 1 0 Vladimir Horowitz
6NxAf7M8DNHOBTmEd3JSO5 Scherzo a capriccio: Presto 352600 1928 1928 0.995 0.424 0.1200 0.911 0.0915 -19.242 0.0593 63.521 0.2660 0 6 0 0 Felix Mendelssohn
6NxAf7M8DNHOBTmEd3JSO5 Scherzo a capriccio: Presto 352600 1928 1928 0.995 0.424 0.1200 0.911 0.0915 -19.242 0.0593 63.521 0.2660 0 6 0 0 Vladimir Horowitz
6O0puPuyrxPjDTHDUgsWI7 Valse oubliΓ©e No. 1 in F-Sharp Major, S. 215/1 136627 1928 1928 0.956 0.444 0.1970 0.435 0.0744 -17.226 0.0400 80.495 0.3050 1 11 0 0 Franz Liszt

Task 2: Import Playlist Dataset

We responsibly download and combine all JSON playlist slices into a single list for future processing.

Code
load_playlists <- function() {
  library(jsonlite)
  library(purrr)
  
  dir_path <- "data/mp03/data1"
  if (!dir.exists(dir_path)) dir.create(dir_path, recursive = TRUE)
  
  base_url <- "https://raw.githubusercontent.com/DevinOgrady/spotify_million_playlist_dataset/main/data1/"
  starts <- seq(0, 999000, by = 1000)
  file_names <- sprintf("mpd.slice.%d-%d.json", starts, starts + 999)
  file_paths <- file.path(dir_path, file_names)
  
  for (i in seq_along(file_names)) {
    if (!file.exists(file_paths[i])) {
      url <- paste0(base_url, file_names[i])
      tryCatch({
        download.file(url, destfile = file_paths[i], mode = "wb", timeout = 300)
      }, error = function(e) {
        message("⚠️ Failed to download: ", file_names[i])
      })
    }
  }

  read_playlist_file <- function(path) {
    tryCatch(
      fromJSON(path)$playlists,
      error = function(e) {
        message("❌ Skipping corrupted file: ", path)
        return(NULL)
      }
    )
  }

  valid_paths <- file_paths[file.exists(file_paths)]
  playlists_list <- map(valid_paths, read_playlist_file)
  playlists_list <- compact(playlists_list)
  
  return(playlists_list)
}

PLAYLISTS_LIST <- load_playlists()
all_playlists <- PLAYLISTS_LIST %>% list_rbind()
DT::datatable(
  head(all_playlists, 10),
  options = list(
    pageLength = 6,
    dom = 'tip',
    scrollX = TRUE
  ),
  class = "display compact stripe hover",
  rownames = FALSE
)

🎼 Task 3: Rectify Playlist Data to Track-Level Format

We flatten the hierarchical playlist JSONs into a clean, rectangular track-level format, stripping unnecessary prefixes and standardizing column names.

Code
strip_spotify_prefix <- function(x){
  str_extract(x, ".*:.*:(.*)")
}

rectified_data <- all_playlists %>%
  select(
    playlist_name = name,
    playlist_id = pid,
    playlist_followers = num_followers,
    tracks
  ) %>%
  unnest(tracks) %>%
  mutate(
    playlist_position = row_number(),
    artist_name = map_chr(artist_name, 1, .default = NA_character_),
    artist_id = strip_spotify_prefix(artist_uri),
    track_name = track_name,
    track_id = strip_spotify_prefix(track_uri),
    album_name = album_name,
    album_id = strip_spotify_prefix(album_uri),
    duration = duration_ms
  ) %>%
  select(
    playlist_name, playlist_id, playlist_position, playlist_followers,
    artist_name, artist_id, track_name, track_id,
    album_name, album_id, duration
  )
spotify_table(head(rectified_data, 10))
playlist_name playlist_id playlist_position playlist_followers artist_name artist_id track_name track_id album_name album_id duration
Throwbacks 0 1 1 Missy Elliott spotify:artist:2wIVse2owClT7go1WT98tk Lose Control (feat. Ciara & Fat Man Scoop) spotify:track:0UaMYEvWZi0ZqiDOoHU3YI The Cookbook spotify:album:6vV5UrXcfyQD1wu4Qo2I9K 226863
Throwbacks 0 2 1 Britney Spears spotify:artist:26dSoYclwsYLMAKD3tpOr4 Toxic spotify:track:6I9VzXrHxO9rA9A5euc8Ak In The Zone spotify:album:0z7pVBGOD7HCIB7S8eLkLI 198800
Throwbacks 0 3 1 BeyoncΓ© spotify:artist:6vWDO969PvNqNYHIOW5v0m Crazy In Love spotify:track:0WqIKmW4BTrj3eJFmnCKMv Dangerously In Love (Alben fΓΌr die Ewigkeit) spotify:album:25hVFAxTlDvXbx2X2QkUkE 235933
Throwbacks 0 4 1 Justin Timberlake spotify:artist:31TPClRtHm23RisEBtV3X7 Rock Your Body spotify:track:1AWQoqb9bSvzTjaLralEkT Justified spotify:album:6QPkyl04rXwTGlGlcYaRoW 267266
Throwbacks 0 5 1 Shaggy spotify:artist:5EvFsr3kj42KNv97ZEnqij It Wasn't Me spotify:track:1lzr43nnXAijIGYnCT8M8H Hot Shot spotify:album:6NmFmPX56pcLBOFMhIiKvF 227600
Throwbacks 0 6 1 Usher spotify:artist:23zg3TcAtWQy7J6upgbUnj Yeah! spotify:track:0XUfyU2QviPAs6bxSpXYG4 Confessions spotify:album:0vO0b1AvY49CPQyVisJLj0 250373
Throwbacks 0 7 1 Usher spotify:artist:23zg3TcAtWQy7J6upgbUnj My Boo spotify:track:68vgtRHr7iZHpzGpon6Jlo Confessions spotify:album:1RM6MGv6bcl6NrAG8PGoZk 223440
Throwbacks 0 8 1 The Pussycat Dolls spotify:artist:6wPhSqRtPu1UhRCDX5yaDJ Buttons spotify:track:3BxWKCI06eQ5Od8TY2JBeA PCD spotify:album:5x8e8UcCeOgrOzSnDGuPye 225560
Throwbacks 0 9 1 Destiny's Child spotify:artist:1Y8cdNmUJH7yBTd9yOvr5i Say My Name spotify:track:7H6ev70Weq6DdpZyyTmUXk The Writing's On The Wall spotify:album:283NWqNsCA9GwVHrJk59CG 271333
Throwbacks 0 10 1 OutKast spotify:artist:1G9G7WwrXka3Z1r7aIDjI7 Hey Ya! - Radio Mix / Club Mix spotify:track:2PpruBYCo4H7WOBJ7Q2EwM Speakerboxxx/The Love Below spotify:album:1UsmQ3bpJTyK6ygoOOjG1r 235213

🎧 Task 4: Initial Exploration of Track & Playlist Data

This section investigates core statistics of the combined playlist + song characteristics data set.

Code
strip_spotify_prefix <- function(x){
  stringr::str_replace(x, "spotify:track:", "")
}

rectified_data <- rectified_data %>%
  mutate(track_id = strip_spotify_prefix(track_id)) %>%
  filter(!is.na(track_id) & track_id != "")

SONGS <- SONGS %>%
  filter(!is.na(id) & id != "")

joined_data <- inner_join(rectified_data, SONGS, by = c("track_id" = "id"))

🎡 Q1: How many distinct tracks and artists?

Code
distinct_tracks <- joined_data %>% distinct(track_id) %>% nrow()
distinct_artists <- joined_data %>% distinct(artist_id) %>% nrow()

spotify_table(
  tibble(Metric = c("Distinct Tracks", "Distinct Artists"),
         Count = c(distinct_tracks, distinct_artists))
)
Metric Count
Distinct Tracks 50684
Distinct Artists 9609

πŸ“ Analysis: The dataset contains a rich collection of unique tracks and artists, showcasing Spotify’s extensive catalog diversity across user playlists.

πŸ”₯ Q2: What are the 5 most common tracks?

Code
top_tracks <- joined_data %>%
  group_by(track_name) %>%
  summarise(Appearances = n(), .groups = "drop") %>%
  arrange(desc(Appearances)) %>%
  slice_head(n = 5)

spotify_table(top_tracks)
track_name Appearances
Champions 27888
No Problem (feat. Lil Wayne & 2 Chainz) 26826
Closer 25742
F**kin' Problems 25136
Sucker For Pain (with Wiz Khalifa, Imagine Dragons, Logic & Ty Dolla $ign feat. X Ambassadors) 25086

πŸ“ Analysis: The most frequently appearing songs offer insight into widely loved and repeat-worthy tracks across millions of playlists.

πŸ’ƒ Q4: Most Danceable Track

Code
most_danceable <- SONGS %>% arrange(desc(danceability)) %>% slice_head(n = 1)

danceable_count <- rectified_data %>%
  filter(track_id == most_danceable$id) %>%
  nrow()

spotify_table(most_danceable %>% 
  select(name, artist, danceability, popularity) %>% 
  mutate(`# of Playlists` = danceable_count))
name artist danceability popularity # of Playlists
Funky Cold Medina Tone-Loc 0.988 57 209

πŸ“ Analysis: With high danceability and moderate popularity, this track captures rhythmic excellence while still being somewhat niche.

⏱️ Q5: Playlist with Longest Average Track Duration

Code
longest_avg_playlist <- joined_data %>%
  group_by(playlist_name, playlist_id) %>%
  summarise(avg_duration = mean(duration, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(avg_duration)) %>%
  slice_head(n = 1)

longest_avg_playlist %>%
  mutate(avg_duration_min = round(avg_duration / 60000, 2)) %>%
  select(playlist_name, playlist_id, avg_duration_min) %>%
  spotify_table()
playlist_name playlist_id avg_duration_min
Sleep 611205 68.67

πŸ“ Analysis: This playlist favors longer-form listening experiencesβ€”perfect for chill or storytelling-heavy sessions.

⭐ Q6: Most Followed Playlist

Code
most_followed <- joined_data %>%
  select(playlist_id, playlist_name, playlist_followers) %>%
  distinct() %>%
  arrange(desc(playlist_followers)) %>%
  slice_head(n = 1)

spotify_table(most_followed)
playlist_id playlist_name playlist_followers
746359 Breaking Bad 53519

πŸ“ Analysis: High follower count reflects strong user trust and playlist curation qualityβ€”these often become global listening staples.

🎧 Task 7: Curate and Analyze Your Ultimate Playlist – β€œHustle & Heart”

Twelve tracks. One vibe. Built from raw energy, emotional drive, and underdog spirit. Featuring rap heavyweights, slept-on gems, and genre-bending transitions, β€œHustle & Heart” was crafted using 5 analytical heuristics and a whole lot of gut.

🎢 Evolution of Audio Features in β€˜Hustle & Heart’ Playlist

Hustle and Heart 🎧

🧠 Note: While most tracks in Hustle & Heart were selected using a data-driven similarity score, two foundational songs β€” β€œDrop the World” and β€œNo Role Modelz” β€” were manually included as thematic anchors due to their lyrical intensity and motivational energy as they were included in data but was dropped down during popularity ranking.

1
Power Trip
J. Cole
2
Crooked Smile
J. Cole
πŸ“‰ Hidden Gem
3
Young, Wild & Free (feat. Bruno Mars) - feat. Bruno Mars
Snoop Dogg
πŸ“‰ Hidden Gem
4
Battle Scars
Lupe Fiasco
5
Mercy
Kanye West
🧠 New Discovery
6
Love Me
Lil Wayne
πŸ“‰ Hidden Gem
7
Lollipop
Lil Wayne
8
Rock Your Body
Justin Timberlake
9
Beautiful Girls
Sean Kingston
10
A Milli
Lil Wayne
11
Drop the World
Eminem, Lil Wayne
πŸ“‰ Hidden Gem
12
No Role Modelz
J. Cole
πŸ“‰ Hidden Gem